53 research outputs found

    Selective labeling: identifying representative sub-volumes for interactive segmentation

    Get PDF
    Automatic segmentation of challenging biomedical volumes with multiple objects is still an open research field. Automatic approaches usually require a large amount of training data to be able to model the complex and often noisy appearance and structure of biological organelles and their boundaries. However, due to the variety of different biological specimens and the large volume sizes of the datasets, training data is costly to produce, error prone and sparsely available. Here, we propose a novel Selective Labeling algorithm to overcome these challenges; an unsupervised sub-volume proposal method that identifies the most representative regions of a volume. This massively-reduced subset of regions are then manually labeled and combined with an active learning procedure to fully segment the volume. Results on a publicly available EM dataset demonstrate the quality of our approach by achieving equivalent segmentation accuracy with only 5 % of the training data

    Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

    Full text link
    Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201

    A Diagram Is Worth A Dozen Images

    Full text link
    Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs

    Localization-Aware Active Learning for Object Detection

    Full text link
    Active learning - a class of algorithms that iteratively searches for the most informative samples to include in a training dataset - has been shown to be effective at annotating data for image classification. However, the use of active learning for object detection is still largely unexplored as determining informativeness of an object-location hypothesis is more difficult. In this paper, we address this issue and present two metrics for measuring the informativeness of an object hypothesis, which allow us to leverage active learning to reduce the amount of annotated data needed to achieve a target object detection performance. Our first metric measures 'localization tightness' of an object hypothesis, which is based on the overlapping ratio between the region proposal and the final prediction. Our second metric measures 'localization stability' of an object hypothesis, which is based on the variation of predicted object locations when input images are corrupted by noise. Our experimental results show that by augmenting a conventional active-learning algorithm designed for classification with the proposed metrics, the amount of labeled training data required can be reduced up to 25%. Moreover, on PASCAL 2007 and 2012 datasets our localization-stability method has an average relative improvement of 96.5% and 81.9% over the baseline method using classification only

    A high performance CRF model for clothes parsing

    Get PDF
    In this paper we tackle the problem of clothing parsing: Our goal is to segment and classify different garments a person is wearing. We frame the problem as the one of inference in a pose-aware Conditional Random Field (CRF) which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts. We demonstrate the effectiveness of our approach on the Fashionista dataset and show that we can obtain a significant improvement over the state-of-the-art.Peer ReviewedPostprint (published version

    End-to-end training of object class detectors for mean average precision

    Get PDF
    We present a method for training CNN-based object class detectors directly using mean average precision (mAP) as the training loss, in a truly end-to-end fashion that includes non-maximum suppression (NMS) at training time. This contrasts with the traditional approach of training a CNN for a window classification loss, then applying NMS only at test time, when mAP is used as the evaluation metric in place of classification accuracy. However, mAP following NMS forms a piecewise-constant structured loss over thousands of windows, with gradients that do not convey useful information for gradient descent. Hence, we define new, general gradient-like quantities for piecewise constant functions, which have wide applicability. We describe how to calculate these efficiently for mAP following NMS, enabling to train a detector based on Fast R-CNN directly for mAP. This model achieves equivalent performance to the standard Fast R-CNN on the PASCAL VOC 2007 and 2012 datasets, while being conceptually more appealing as the very same model and loss are used at both training and test time.Comment: This version has minor additions to results (ablation study) and discussio

    A Weakly Supervised Deep Learning Approach for Detecting Malaria and Sickle Cells in Blood Films

    Get PDF
    Machine vision analysis of blood films imaged under a brightfield microscope could provide scalable malaria diagnosis solutions in resource constrained endemic urban settings. The major bottleneck in successfully analyzing blood films with deep learning vision techniques is a lack of object-level annotations of disease markers such as parasites or abnormal red blood cells. To overcome this challenge, this work proposes a novel deep learning supervised approach that leverages weak labels readily available from routine clinical microscopy to diagnose malaria in thick blood film microscopy. This approach is based on aggregating the convolutional features of multiple objects present in one hundred high resolution image fields. We show that this method not only achieves expert-level malaria diagnostic accuracy without any hard object-level labels but can also identify individual malaria parasites in digitized thick blood films, which is useful in assessing disease severity and response to treatment. We demonstrate another application scenario where our approach is able to detect sickle cells in thin blood films. We discuss the wider applicability of the approach in automated analysis of thick blood films for the diagnosis of other blood disorders

    Region-Based Semantic Segmentation with End-to-End Training

    Get PDF
    We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which targets pixel labeling performance at the end of the pipeline. More recent fully convolutional methods are capable of end-to-end training for the final pixel labeling, but resort to fixed patches as spatial support. We show how to modify modern region-based approaches to enable end-to-end training for semantic segmentation. This is achieved via a differentiable region-to-pixel layer and a differentiable free-form Region-of-Interest pooling layer. Our method improves the state-of-the-art in terms of class-average accuracy with 64.0% on SIFT Flow and 49.9% on PASCAL Context, and is particularly accurate at object boundaries.Comment: ECCV 2016 camera-read

    Web image annotation via subspace-sparsity collaborated feature selection

    Full text link
    The number of web images has been explosively growing due to the development of network and storage technology. These images make up a large amount of current multimedia data and are closely related to our daily life. To efficiently browse, retrieve and organize the web images, numerous approaches have been proposed. Since the semantic concepts of the images can be indicated by label information, automatic image annotation becomes one effective technique for image management tasks. Most existing annotation methods use image features that are often noisy and redundant. Hence, feature selection can be exploited for a more precise and compact representation of the images, thus improving the annotation performance. In this paper, we propose a novel feature selection method and apply it to automatic image annotation. There are two appealing properties of our method. First, it can jointly select the most relevant features from all the data points by using a sparsity-based model. Second, it can uncover the shared subspace of original features, which is beneficial for multi-label learning. To solve the objective function of our method, we propose an efficient iterative algorithm. Extensive experiments are performed on large image databases that are collected from the web. The experimental results together with the theoretical analysis have validated the effectiveness of our method for feature selection, thus demonstrating its feasibility of being applied to web image annotation. © 2012 IEEE
    • …
    corecore